Towards Language Independent NE Identification in the context of Wikipedia

ثبت نشده
چکیده

Named Entity Identification/Recognition is a key component for most Information Extraction tasks. All the existing approaches for NEI use extensive language specific resources. This paper deals with the problem of Multi lingual Named Entity Identification (NEI), and explains the need to address this problem in a language independent fashion. In this work we focus on Less Resourced languages like Hindi, Tamil that does not have prevalent language resources. Inherent structure of Wikipedia articles in multiple languages is exploited for NEI of less resourced languages. Other major contribution of this work is to extend the identification to word level from phrase level, under the intuition that it would increase the coverage of Named Entities. We evaluate our approach on comparable list of NE’s in Hindi, Tamil and English that are manually collected from Named Entity Workshop (NEWS).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

Named Entity Corpus Construction using Wikipedia and DBpedia Ontology

In this paper, we propose a novel method to automatically build a named entity corpus based on the DBpedia ontology. Since most of named entity recognition systems require time and effort consuming annotation tasks as training data. Work on NER has thus for been limited on certain languages like English that are resource-abundant in general. As an alternative, we suggest that the NE corpus gene...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

Why Figurative Language: Perceived Discourse Goals for Metaphors and Similes by L2 Learners

The goal of this study was to investigate the kinds of discourse goals that Iranian EFL learners perceive as the most probable reasons behind the utterance of figurative language, metaphors and similes, with reference to 4 independent variables of Figure Type (Metaphor or Simile), Tenor Concreteness (Concrete or Abstract), Context (List Format or Story), and Modality (Oral, Written, and Both). ...

متن کامل

Mining Transliterations from Wikipedia using Dynamic Bayesian Networks

Transliteration mining is aimed at building high quality multi-lingual named entity (NE) lexicons for improving performance in various Natural Language Processing (NLP) tasks including Machine Translation (MT) and Cross Language Information Retrieval (CLIR). In this paper, we apply two Dynamic Bayesian network (DBN)-based edit distance (ED) approaches in mining transliteration pairs from Wikipe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010